Potential performance improvements #16

rubdos · 2024-07-07T15:58:12Z

I'm looking to find some improvements, currently no measurable effects. At least we're less frequently allocating!

rubdos · 2024-07-07T17:15:05Z

Things I could do by eye, up until now:

Benchmarking decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200: Collecting 100 samples in estimated 5.decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200
                        time:   [2.7536 ms 2.7607 ms 2.7682 ms]
                        change: [-6.8198% -6.4337% -6.0751%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

Benchmarking decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200: Collecting 100 samples in estimatdecode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200
                        time:   [2.7061 ms 2.7117 ms 2.7181 ms]
                        change: [-9.6423% -9.3190% -9.0130%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  14 (14.00%) high severe

     Running benches/encode.rs (target/release/deps/encode-a9d404ceb3500bd9)
Benchmarking encode data/octocat.png: Collecting 100 samples in estimated 5.1413 s (1800 iteencode data/octocat.png time:   [2.8060 ms 2.8099 ms 2.8146 ms]
                        change: [+0.4714% +0.7873% +1.0953%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 18 outliers among 100 measurements (18.00%)
  10 (10.00%) high mild
  8 (8.00%) high severe

Any further improvements will require some more fine-grained benchmarking.

rubdos · 2024-07-08T09:06:44Z

Benchmarking decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200
                        time:   [1.2540 ms 1.2562 ms 1.2585 ms]
                        change: [-59.901% -59.516% -58.875%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  8 (8.00%) high mild
  2 (2.00%) high severe

     Running benches/encode.rs (target/release/deps/encode-104439108152a4b3)
encode data/octocat.png time:   [2.6665 ms 2.6696 ms 2.6731 ms]
                        change: [-11.459% -11.268% -11.058%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

Focussed mainly on decoding, since we use that way more often than encoding anyway.

Co-authored-by: Thibaut Vandervelden <thvdveld@vub.be>

codecov · 2024-07-08T09:25:54Z

Codecov Report

Attention: Patch coverage is 96.29630% with 2 lines in your changes missing coverage. Please review.

Project coverage is 87.97%. Comparing base (53c0bbe) to head (f49f21d).

Files	Patch %	Lines
src/base83.rs	93.33%	1 Missing ⚠️
src/lib.rs	96.87%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #16      +/-   ##
==========================================
+ Coverage   87.00%   87.97%   +0.97%     
==========================================
  Files           6        6              
  Lines         300      316      +16     
==========================================
+ Hits          261      278      +17     
+ Misses         39       38       -1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

rubdos · 2024-07-08T09:35:19Z

encode data/octocat.png time:   [647.05 µs 648.14 µs 649.39 µs]
                        change: [-74.104% -73.880% -73.483%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe

Oops.

rubdos · 2024-07-08T12:30:19Z

If we want more speed, we'll have to change the algorithm to use some fast Fourrier-equivalent of the DCT. Not currently in the mood for that. But I think with this PR, we probably have the fastest blurhash around :'-)

We should probably make this WASM-compatible and make a demo, like https://github.com/fpapado/blurhash-rust-wasm did before. Cc @fpapado

TL;DR: ~60% faster in decoding (8ms for 512x512, 80µs for 50x50), ~77% faster in encoding (688µs for octocat.png) on my 7840U.

Benchmark result dump

decode LEHLk~WB2yk8pyo0adR*.7kCMdnj/50
                        time:   [76.581 µs 77.061 µs 77.600 µs]
                        change: [-59.508% -58.760% -57.587%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

decode_into LEHLk~WB2yk8pyo0adR*.7kCMdnj/50
                        time:   [76.220 µs 76.484 µs 76.767 µs]
                        change: [-59.812% -59.648% -59.485%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

decode LEHLk~WB2yk8pyo0adR*.7kCMdnj/100
                        time:   [311.30 µs 312.27 µs 313.32 µs]
                        change: [-61.161% -60.924% -60.675%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

decode_into LEHLk~WB2yk8pyo0adR*.7kCMdnj/100
                        time:   [308.58 µs 310.34 µs 312.30 µs]
                        change: [-61.024% -60.820% -60.608%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  11 (11.00%) low severe
  3 (3.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe

Benchmarking decode LEHLk~WB2yk8pyo0adR*.7kCMdnj/200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
decode LEHLk~WB2yk8pyo0adR*.7kCMdnj/200
                        time:   [1.2214 ms 1.2257 ms 1.2309 ms]
                        change: [-61.385% -61.246% -61.108%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking decode_into LEHLk~WB2yk8pyo0adR*.7kCMdnj/200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.4s, enable flat sampling, or reduce sample count to 60.
decode_into LEHLk~WB2yk8pyo0adR*.7kCMdnj/200
                        time:   [1.2470 ms 1.2542 ms 1.2617 ms]
                        change: [-59.459% -59.035% -58.366%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

decode LEHLk~WB2yk8pyo0adR*.7kCMdnj/256
                        time:   [2.0413 ms 2.0476 ms 2.0543 ms]
                        change: [-60.866% -60.714% -60.558%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

decode_into LEHLk~WB2yk8pyo0adR*.7kCMdnj/256
                        time:   [1.9808 ms 1.9856 ms 1.9908 ms]
                        change: [-60.847% -60.733% -60.626%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 22 outliers among 100 measurements (22.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  19 (19.00%) high severe

decode LEHLk~WB2yk8pyo0adR*.7kCMdnj/500
                        time:   [7.7440 ms 7.7812 ms 7.8194 ms]
                        change: [-61.459% -61.235% -61.001%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

decode_into LEHLk~WB2yk8pyo0adR*.7kCMdnj/500
                        time:   [7.8458 ms 7.8854 ms 7.9258 ms]
                        change: [-60.554% -60.240% -59.941%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

decode LEHLk~WB2yk8pyo0adR*.7kCMdnj/512
                        time:   [8.0230 ms 8.0522 ms 8.0823 ms]
                        change: [-60.630% -60.468% -60.301%] (p = 0.00 < 0.05)
                        Performance has improved.

decode_into LEHLk~WB2yk8pyo0adR*.7kCMdnj/512
                        time:   [7.9280 ms 7.9577 ms 7.9909 ms]
                        change: [-61.498% -61.274% -61.030%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild

decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./50
                        time:   [78.969 µs 79.119 µs 79.280 µs]
                        change: [-59.349% -58.869% -58.112%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  6 (6.00%) high severe

decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./50
                        time:   [77.031 µs 77.480 µs 77.977 µs]
                        change: [-59.693% -59.190% -58.393%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./100
                        time:   [309.55 µs 310.95 µs 312.54 µs]
                        change: [-59.833% -59.728% -59.614%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
  9 (9.00%) high severe

decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./100
                        time:   [303.18 µs 303.78 µs 304.42 µs]
                        change: [-61.100% -60.958% -60.826%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
  8 (8.00%) high mild
  10 (10.00%) high severe

Benchmarking decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 60.
decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200
                        time:   [1.2103 ms 1.2123 ms 1.2147 ms]
                        change: [-61.186% -61.009% -60.820%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) high mild
  4 (4.00%) high severe

Benchmarking decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.2s, enable flat sampling, or reduce sample count to 60.
decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200
                        time:   [1.2016 ms 1.2029 ms 1.2044 ms]
                        change: [-60.875% -60.521% -59.891%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) low mild
  3 (3.00%) high severe

decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./256
                        time:   [1.9795 ms 1.9854 ms 1.9919 ms]
                        change: [-60.766% -60.640% -60.505%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  9 (9.00%) high mild
  5 (5.00%) high severe

decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./256
                        time:   [2.0311 ms 2.0359 ms 2.0411 ms]
                        change: [-59.577% -59.467% -59.359%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  18 (18.00%) high mild
  1 (1.00%) high severe

decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./500
                        time:   [7.4842 ms 7.4936 ms 7.5048 ms]
                        change: [-61.239% -61.132% -61.028%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  5 (5.00%) high severe

decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./500
                        time:   [7.7484 ms 7.7664 ms 7.7861 ms]
                        change: [-59.656% -59.513% -59.368%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  9 (9.00%) high severe

decode LGF5]+Yk^6#M@-5c,1J5@[or[Q6./512
                        time:   [8.1303 ms 8.1514 ms 8.1738 ms]
                        change: [-60.495% -60.316% -60.125%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild

decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./512
                        time:   [7.8848 ms 7.9084 ms 7.9352 ms]
                        change: [-61.865% -61.721% -61.575%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) high mild
  11 (11.00%) high severe

decode L6Pj0^jE.AyE_3t7t7R**0o#DgR4/50
                        time:   [78.130 µs 78.562 µs 79.062 µs]
                        change: [-60.030% -59.837% -59.660%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 25 outliers among 100 measurements (25.00%)
  4 (4.00%) low severe
  14 (14.00%) low mild
  5 (5.00%) high mild
  2 (2.00%) high severe

decode_into L6Pj0^jE.AyE_3t7t7R**0o#DgR4/50
                        time:   [76.434 µs 76.588 µs 76.786 µs]
                        change: [-61.827% -61.258% -60.850%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  2 (2.00%) high severe

decode L6Pj0^jE.AyE_3t7t7R**0o#DgR4/100
                        time:   [310.06 µs 310.75 µs 311.60 µs]
                        change: [-62.352% -61.424% -60.901%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe

decode_into L6Pj0^jE.AyE_3t7t7R**0o#DgR4/100
                        time:   [309.55 µs 311.01 µs 312.56 µs]
                        change: [-61.359% -60.753% -60.079%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  6 (6.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe

Benchmarking decode L6Pj0^jE.AyE_3t7t7R**0o#DgR4/200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
decode L6Pj0^jE.AyE_3t7t7R**0o#DgR4/200
                        time:   [1.2074 ms 1.2115 ms 1.2157 ms]
                        change: [-60.847% -60.468% -59.621%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  4 (4.00%) high mild
  6 (6.00%) high severe

Benchmarking decode_into L6Pj0^jE.AyE_3t7t7R**0o#DgR4/200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
decode_into L6Pj0^jE.AyE_3t7t7R**0o#DgR4/200
                        time:   [1.2369 ms 1.2449 ms 1.2525 ms]
                        change: [-61.300% -60.830% -60.125%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

decode L6Pj0^jE.AyE_3t7t7R**0o#DgR4/256
                        time:   [2.0499 ms 2.0565 ms 2.0635 ms]
                        change: [-59.367% -59.214% -59.049%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  5 (5.00%) high mild

decode_into L6Pj0^jE.AyE_3t7t7R**0o#DgR4/256
                        time:   [2.0361 ms 2.0413 ms 2.0468 ms]
                        change: [-60.688% -60.562% -60.429%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 100 measurements (23.00%)
  1 (1.00%) low mild
  15 (15.00%) high mild
  7 (7.00%) high severe

decode L6Pj0^jE.AyE_3t7t7R**0o#DgR4/500
                        time:   [7.7494 ms 7.7675 ms 7.7866 ms]
                        change: [-60.686% -60.568% -60.449%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  14 (14.00%) high mild

decode_into L6Pj0^jE.AyE_3t7t7R**0o#DgR4/500
                        time:   [7.6974 ms 7.7092 ms 7.7231 ms]
                        change: [-61.733% -61.535% -61.357%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) high mild
  5 (5.00%) high severe

decode L6Pj0^jE.AyE_3t7t7R**0o#DgR4/512
                        time:   [8.0575 ms 8.1053 ms 8.1577 ms]
                        change: [-60.247% -59.996% -59.737%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild

decode_into L6Pj0^jE.AyE_3t7t7R**0o#DgR4/512
                        time:   [7.8932 ms 7.9117 ms 7.9319 ms]
                        change: [-61.642% -61.465% -61.283%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24 outliers among 100 measurements (24.00%)
  3 (3.00%) low mild
  5 (5.00%) high mild
  16 (16.00%) high severe

decode LKO2:N%2Tw=w]~RBVZRi};RPxuwH/50
                        time:   [79.184 µs 79.479 µs 79.779 µs]
                        change: [-59.620% -59.399% -59.192%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe

decode_into LKO2:N%2Tw=w]~RBVZRi};RPxuwH/50
                        time:   [76.670 µs 76.892 µs 77.165 µs]
                        change: [-61.079% -60.448% -60.071%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

decode LKO2:N%2Tw=w]~RBVZRi};RPxuwH/100
                        time:   [311.67 µs 312.63 µs 313.67 µs]
                        change: [-61.709% -60.983% -60.272%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  6 (6.00%) high mild
  3 (3.00%) high severe

decode_into LKO2:N%2Tw=w]~RBVZRi};RPxuwH/100
                        time:   [308.45 µs 311.43 µs 314.53 µs]
                        change: [-60.671% -59.966% -59.062%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  4 (4.00%) high mild
  2 (2.00%) high severe

Benchmarking decode LKO2:N%2Tw=w]~RBVZRi};RPxuwH/200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.5s, enable flat sampling, or reduce sample count to 60.
decode LKO2:N%2Tw=w]~RBVZRi};RPxuwH/200
                        time:   [1.2453 ms 1.2511 ms 1.2570 ms]
                        change: [-60.492% -60.339% -60.195%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  13 (13.00%) low mild
  5 (5.00%) high mild
  1 (1.00%) high severe

Benchmarking decode_into LKO2:N%2Tw=w]~RBVZRi};RPxuwH/200: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
decode_into LKO2:N%2Tw=w]~RBVZRi};RPxuwH/200
                        time:   [1.2415 ms 1.2439 ms 1.2464 ms]
                        change: [-59.743% -59.632% -59.525%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild

decode LKO2:N%2Tw=w]~RBVZRi};RPxuwH/256
                        time:   [2.0237 ms 2.0368 ms 2.0506 ms]
                        change: [-60.193% -59.879% -59.549%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

decode_into LKO2:N%2Tw=w]~RBVZRi};RPxuwH/256
                        time:   [2.0165 ms 2.0284 ms 2.0412 ms]
                        change: [-60.734% -60.403% -60.061%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  6 (6.00%) high mild
  1 (1.00%) high severe

decode LKO2:N%2Tw=w]~RBVZRi};RPxuwH/500
                        time:   [7.7285 ms 7.7631 ms 7.7992 ms]
                        change: [-60.265% -60.004% -59.757%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe

decode_into LKO2:N%2Tw=w]~RBVZRi};RPxuwH/500
                        time:   [7.7923 ms 7.8177 ms 7.8433 ms]
                        change: [-59.608% -59.445% -59.276%] (p = 0.00 < 0.05)
                        Performance has improved.

decode LKO2:N%2Tw=w]~RBVZRi};RPxuwH/512
                        time:   [8.1224 ms 8.1486 ms 8.1750 ms]
                        change: [-60.953% -60.735% -60.530%] (p = 0.00 < 0.05)
                        Performance has improved.

decode_into LKO2:N%2Tw=w]~RBVZRi};RPxuwH/512
                        time:   [7.9353 ms 7.9662 ms 7.9982 ms]
                        change: [-61.036% -60.846% -60.636%] (p = 0.00 < 0.05)
                        Performance has improved.

     Running benches/encode.rs (target/release/deps/encode-885ad83d48e0fb49)
encode data/SIPI_Jelly_Beans.tiff
                        time:   [697.87 µs 700.42 µs 703.01 µs]
                        change: [-77.324% -77.223% -77.116%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

encode data/octocat.png time:   [683.95 µs 688.12 µs 692.70 µs]
                        change: [-77.643% -77.402% -77.017%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe

rubdos · 2024-07-08T12:42:25Z

diff --git a/src/util.rs b/src/util.rs
index b79e13c..28e2134 100644
--- a/src/util.rs
+++ b/src/util.rs
@@ -4,12 +4,12 @@ include!(concat!(env!("OUT_DIR"), "/srgb_lookup.rs"));
 pub fn linear_to_srgb(value: f32) -> u8 {
     let v = f32::max(0., f32::min(1., value));
     if v <= 0.003_130_8 {
-        (v * 12.92 * 255. + 0.5).round() as u8
+        (v * 12.92 * 255. + 0.5) as u8
     } else {
         // The original C implementation uses this formula:
         // ((1.055 * f32::powf(v, 1. / 2.4) - 0.055) * 255. + 0.5).round() as u8
         // But we can distribute the latter multiplication, to reduce the number of operations:
-        ((1.055 * 255.) * f32::powf(v, 1. / 2.4) - (0.055 * 255. - 0.5)).round() as u8
+        ((1.055 * 255.) * f32::powf(v, 1. / 2.4) - (0.055 * 255. - 0.5)) as u8
     }
 }

decode_into LGF5]+Yk^6#M@-5c,1J5@[or[Q6./200
                        time:   [1.0087 ms 1.0139 ms 1.0194 ms]
                        change: [-17.801% -15.870% -13.643%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  7 (7.00%) high mild
  8 (8.00%) high severe

... let's not.

gferon

Looks like you had fun, this is always welcome 🚀

rubdos force-pushed the performance branch from 1e18620 to b3806d1 Compare July 7, 2024 16:00

rubdos force-pushed the performance branch from 4a32563 to 3741460 Compare July 8, 2024 09:05

rubdos and others added 15 commits July 8, 2024 11:21

Preallocate factors vector

539a3ed

Preallocate whole blurhash string

a2496c7

build: Move write_srgb in a function

9f4697a

Write base83 characters list through build script

fdcd7cd

Precompute height/width inverses

71417af

Pull pi*x/width and pi*y/height out of hot loop

5e18d52

Generate base83 inverse character map

7ad8f2e

Remove a multiplication for sign_pow

08a04bc

Reduce return size for linear_to_srgb

f26590c

Precompute cosines outside of hot loop

c44d605

Pull pi_x_width/pi_y_height out of hot loop

428d553

Use precomputation tables for cosines in decode

bd7d2f8

Use subslice for accessing cosine table

ab0520e

Co-authored-by: Thibaut Vandervelden <thvdveld@vub.be>

Use f32::copysign instead of manual branched assignment

8efa5cf

Optimization in linear_to_srgb

f49f21d

rubdos force-pushed the performance branch from 3741460 to f49f21d Compare July 8, 2024 09:22

rubdos added 2 commits July 8, 2024 11:29

Faster computation of maximum

1cca59d

Encode: precompute cosines

a89f131

rubdos force-pushed the performance branch from 5eea83b to b856245 Compare July 8, 2024 12:28

asserts as hints for the optimizer, use zip as iterator

73fa250

rubdos force-pushed the performance branch from b856245 to 73fa250 Compare July 8, 2024 12:29

rubdos requested a review from gferon July 8, 2024 12:44

gferon approved these changes Jul 8, 2024

View reviewed changes

rubdos merged commit 85938c4 into main Jul 8, 2024
5 checks passed

gferon deleted the performance branch July 23, 2024 09:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential performance improvements #16

Potential performance improvements #16

rubdos commented Jul 7, 2024

rubdos commented Jul 7, 2024

rubdos commented Jul 8, 2024

codecov bot commented Jul 8, 2024

rubdos commented Jul 8, 2024

rubdos commented Jul 8, 2024 •

edited

Loading

rubdos commented Jul 8, 2024

gferon left a comment

Potential performance improvements #16

Potential performance improvements #16

Conversation

rubdos commented Jul 7, 2024

rubdos commented Jul 7, 2024

rubdos commented Jul 8, 2024

codecov bot commented Jul 8, 2024

Codecov Report

rubdos commented Jul 8, 2024

rubdos commented Jul 8, 2024 • edited Loading

rubdos commented Jul 8, 2024

gferon left a comment

Choose a reason for hiding this comment

rubdos commented Jul 8, 2024 •

edited

Loading